NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

Liang, Tongtong; Qiao, Dan; Wang, Yu-Xiang; Parhi, Rahul (November 2025, Advances in neural information processing systems)

We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs---a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions compared to low-norm solutions (i.e., weight decay), which are known not to suffer from the curse of dimensionality. In particular, our minimax lower bound construction, based on a novel packing argument with boundary-localized ReLU neurons, reveals how flat solutions can exploit a kind of "neural shattering" where neurons rarely activate, but with high weight magnitudes. This leads to poor performance in high dimensions. We corroborate these theoretical findings with extensive numerical simulations. To the best of our knowledge, our analysis provides the first systematic explanation for why flat minima may fail to generalize in high dimensions.
more » « less
Free, publicly-accessible full text available November 30, 2026
Weighted variation spaces and approximation by shallow ReLU networks

https://doi.org/10.1016/j.acha.2024.101713

DeVore, Ronald; Nowak, Robert D; Parhi, Rahul; Siegel, J W (January 2025, Applied and Computational Harmonic Analysis)

Full Text Available
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression

Shenouda, Joseph; Parhi, Rahul; Lee, Kangwook; Nowak, Robert (June 2024, Journal of Machine Learning Research)

This paper introduces a novel theoretical framework for the analysis of vector-valued neu- ral networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization e↵ect of weight decay in training networks with activation functions like the rectified linear unit (ReLU). This framework o↵ers a deeper understanding of multi-output networks and their function-space characteristics. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This representer theorem estab- lishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks, shedding new light on multi-task learning with neural networks. Finally, this paper develops a connection between weight-decay regularization and the multi-task lasso problem. This connection leads to novel bounds for layer widths in deep networks that depend on the intrinsic dimensions of the training data representations. This insight not only deepens the understanding of the deep network architectural requirements, but also yields a simple convex optimization method for deep neural network compression. The performance of this compression procedure is evaluated on various architectures.
more » « less
Full Text Available
Deep Learning Meets Sparse Regularization: A signal processing perspective

https://doi.org/10.1109/MSP.2023.3286988

Parhi, Rahul; Nowak, Robert D (September 2023, IEEE Signal Processing Magazine)

Full Text Available
A Continuous Transform for Localized Ridgelets

Shenouda; Joseph; Parhi, Rahul; Nowak, Robert (July 2023, Fourteenth International Conference on Sampling Theory and Applications)

Full Text Available
A Continuous Transform for Localized Ridgelets

https://doi.org/10.1109/SampTA59647.2023.10301398

Shenouda, Joseph; Parhi, Rahul; Nowak, Robert D (July 2023, IEEE)

Full Text Available
Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

https://doi.org/10.1109/TIT.2022.3208653

Parhi, Rahul; Nowak, Robert D (February 2023, IEEE Transactions on Information Theory)

We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the second-order Radon-domain bounded variation space. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. This minimax rate is immune to the curse of dimensionality. We quantify an explicit gap between neural networks and linear methods (which include kernel methods) by deriving a linear minimax lower bound for the estimation problem, showing that linear methods necessarily suffer the curse of dimensionality in this function space. As a result, this paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.
more » « less
Full Text Available
What Kinds of Functions Do Deep Neural Networks Learn? Insights from Variational Spline Theory

https://doi.org/10.1137/21M1418642

Parhi, Rahul; Nowak, Robert D. (June 2022, SIAM Journal on Mathematics of Data Science)

Full Text Available
Banach Space Representer Theorems for Neural Networks and Ridge Splines

Parhi, Rahul; Nowak, Robert D (February 2021, Journal of machine learning research)

We develop a variational framework to understand the properties of the functions learned by neural networks fit to data. We propose and study a family of continuous-domain linear inverse problems with total variation-like regularization in the Radon domain subject to data fitting constraints. We derive a representer theorem showing that finite-width, singlehidden layer neural networks are solutions to these inverse problems. We draw on many techniques from variational spline theory and so we propose the notion of polynomial ridge splines, which correspond to single-hidden layer neural networks with truncated power functions as the activation function. The representer theorem is reminiscent of the classical reproducing kernel Hilbert space representer theorem, but we show that the neural network problem is posed over a non-Hilbertian Banach space. While the learning problems are posed in the continuous-domain, similar to kernel methods, the problems can be recast as finite-dimensional neural network training problems. These neural network training problems have regularizers which are related to the well-known weight decay and path-norm regularizers. Thus, our result gives insight into functional characteristics of trained neural networks and also into the design neural network regularizers. We also show that these regularizers promote neural network solutions with desirable generalization properties. Keywords: neural networks, splines, inverse problems, regularization, sparsity
more » « less
Full Text Available
The Role of Neural Network Activation Functions

https://doi.org/10.1109/LSP.2020.3027517

Parhi, Rahul; Nowak, Robert D (January 2020, IEEE Signal Processing Letters)

A wide variety of activation functions have been proposed for neural networks. The Rectified Linear Unit (ReLU) is especially popular today. There are many practical reasons that motivate the use of the ReLU. This paper provides new theoretical characterizations that support the use of the ReLU, its variants such as the leaky ReLU, as well as other activation functions in the case of univariate, single-hidden layer feedforward neural networks. Our results also explain the importance of commonly used strategies in the design and training of neural networks such as “weight decay” and “path-norm” regularization, and provide a new justification for the use of “skip connections” in network architectures. These new insights are obtained through the lens of spline theory. In particular, we show how neural network training problems are related to infinite-dimensional optimizations posed over Banach spaces of functions whose solutions are well-known to be fractional and polynomial splines, where the particular Banach space (which controls the order of the spline) depends on the choice of activation function.
more » « less
Full Text Available

Search for: All records